gradient quantization
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Information Technology > Security & Privacy (1.00)
- Law (0.68)
- North America > United States > California > San Francisco County > San Francisco (0.28)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > Canada > Quebec > Montreal (0.05)
- (29 more...)
- Information Technology > Security & Privacy (1.00)
- Government (0.68)
- Law (0.67)
Prompt Certified Machine Unlearning with Randomized Gradient Smoothing and Quantization
The right to be forgotten calls for efficient machine unlearning techniques that make trained machine learning models forget a cohort of data. The combination of training and unlearning operations in traditional machine unlearning methods often leads to the expensive computational cost on large-scale data. This paper presents a prompt certified machine unlearning algorithm, PCMU, which executes one-time operation of simultaneous training and unlearning in advance for a series of machine unlearning requests, without the knowledge of the removed/forgotten data. First, we establish a connection between randomized smoothing for certified robustness on classification and randomized smoothing for certified machine unlearning on gradient quantization. Second, we propose a prompt certified machine unlearning model based on randomized data smoothing and gradient quantization. We theoretically derive the certified radius R regarding the data change before and after data removals and the certified budget of data removals about R. Last but not least, we present another practical framework of randomized gradient smoothing and quantization, due to the dilemma of producing high confidence certificates in the first framework. We theoretically demonstrate the certified radius R' regarding the gradient change, the correlation between two types of certified radii, and the certified budget of data removals about R'.
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding
Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to its excellent scalability properties. A fundamental barrier when parallelizing SGD is the high bandwidth cost of communicating gradient updates between nodes; consequently, several lossy compresion heuristics have been proposed, by which nodes only communicate quantized gradients. Although effective in practice, these heuristics do not always guarantee convergence, and it is not clear whether they can be improved. In this paper, we propose Quantized SGD (QSGD), a family of compression schemes for gradient updates which provides convergence guarantees. QSGD allows the user to smoothly trade off \emph{communication bandwidth} and \emph{convergence time}: nodes can adjust the number of bits sent per iteration, at the cost of possibly higher variance. We show that this trade-off is inherent, in the sense that improving it past some threshold would violate information-theoretic lower bounds. QSGD guarantees convergence for convex and non-convex objectives, under asynchrony, and can be extended to stochastic variance-reduced techniques. When applied to training deep neural networks for image classification and automated speech recognition, QSGD leads to significant reductions in end-to-end training time. For example, on 16GPUs, we can train the ResNet152 network to full accuracy on ImageNet 1.8x faster than the full-precision variant.
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Ohio (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (5 more...)
Training with Fewer Bits: Unlocking Edge LLMs Training with Stochastic Rounding
Liu, Taowen, Andronic, Marta, Gündüz, Deniz, Constantinides, George A.
LLM training is resource-intensive. Quantized training improves computational and memory efficiency but introduces quantization noise, which can hinder convergence and degrade model accuracy. Stochastic Rounding (SR) has emerged as a theoretically attractive alternative to deterministic rounding, offering unbiased gradient estimates. However, its interaction with other training factors -- especially batch size -- remains under explored. In this paper, we present a theoretical and empirical study of mini-batch stochastic gradient descent (SGD) with SR, showing that increased batch sizes can compensate for reduced precision during back-propagation. Furthermore, we show that quantizing weights and activations impacts gradient variance in distinct ways. Our experiments validate these theoretical insights.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
Computation- and Communication-Efficient Online FL for Resource-Constrained Aerial Vehicles
Pervej, Ferdous, Jin, Richeng, Chowdhury, Md Moin Uddin, Singh, Simran, Güvenç, İsmail, Dai, Huaiyu
Privacy-preserving distributed machine learning (ML) and aerial connected vehicle (ACV)-assisted edge computing have drawn significant attention lately. Since the onboard sensors of ACVs can capture new data as they move along their trajectories, the continual arrival of such 'newly' sensed data leads to online learning and demands carefully crafting the trajectories. Besides, as typical ACVs are inherently resource-constrained, computation- and communication-efficient ML solutions are needed. Therefore, we propose a computation- and communication-efficient online aerial federated learning (2CEOAFL) algorithm to take the benefits of continual sensed data and limited onboard resources of the ACVs. In particular, considering independently owned ACVs act as selfish data collectors, we first model their trajectories according to their respective time-varying data distributions. We then propose a 2CEOAFL algorithm that allows the flying ACVs to (a) prune the received dense ML model to make it shallow, (b) train the pruned model, and (c) probabilistically quantize and offload their trained accumulated gradients to the central server (CS). Our extensive simulation results show that the proposed 2CEOAFL algorithm delivers comparable performances to its non-pruned and nonquantized, hence, computation- and communication-inefficient counterparts.
- North America > United States > Utah > Cache County > Logan (0.04)
- North America > United States > North Carolina > Wake County > Raleigh (0.04)
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Information Technology > Security & Privacy (1.00)
- Law (0.68)
- North America > United States > California > San Francisco County > San Francisco (0.28)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > Canada > Quebec > Montreal (0.05)
- (29 more...)
- Information Technology > Security & Privacy (1.00)
- Law (0.87)
QLESS: A Quantized Approach for Data Valuation and Selection in Large Language Model Fine-Tuning
Ananta, Moses, Adilazuarda, Muhammad Farid, Zuhri, Zayd Muhammad Kawakibi, Purwarianti, Ayu, Aji, Alham Fikri
Fine-tuning large language models (LLMs) is often constrained by the computational costs of processing massive datasets. We propose \textbf{QLESS} (Quantized Low-rank Gradient Similarity Search), which integrates gradient quantization with the LESS framework to enable memory-efficient data valuation and selection. QLESS employs a two-step compression process: first, it obtains low-dimensional gradient representations through LoRA-based random projection; then, it quantizes these gradients to low-bitwidth representations. Experiments on multiple LLM architectures (LLaMA, Mistral, Qwen) and benchmarks (MMLU, BBH, TyDiQA) show that QLESS achieves comparable data selection performance to LESS while reducing memory usage by up to 16x. Even 1-bit gradient quantization preserves data valuation quality. These findings underscore QLESS as a practical, scalable approach to identifying informative examples within strict memory constraints.
- North America > United States > Utah > Grand County (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- (5 more...)
- Leisure & Entertainment (0.67)
- Energy > Power Industry (0.46)
- Media > Television (0.46)